动机,情感和行动是人类活动中相关的基本因素。尽管长期以来一直认为动机和情感是探索人们如何在人类活动中采取行动的核心,但几乎没有研究支持分析人类精神状态与行动之间的关系。我们介绍了第一项研究,该研究研究了基于语言的人类活动中建模动机,情感和行动的生存能力,即逗号(人类活动的认知框架)。在逗号的指导下,我们定义了三个自然语言处理任务(情感理解,动机理解和有条件的动作生成),并通过自动从故事常识中提取样本来建立一个具有挑战性的数据集冰雹。 NLP应用程序的实验结果证明了建模关系的有效性。此外,与现有方法相比,受逗号启发的模型可以更好地揭示动机,情感和行动之间的基本关系。
translated by 谷歌翻译
基于卷积的方法在医疗图像分割任务中提供了良好的分割性能。但是,这些方法在处理医学图像的边缘时面临以下挑战:(1)以前的基于卷积的方法不关注分割边缘周围前景和背景之间的边界关系,从而导致分割性能的退化当边缘变化时。 (2)卷积层的电感偏置不能适应复杂的边缘变化和多分段区域的聚合,从而导致其性能改善大部分仅限于分割分段区域而不是边缘的范围。为了应对这些挑战,我们提出了MFI(多尺度特征交互)块和英亩(轴向上下文关系编码器)块上的CM-MLP框架,以精确分割医疗图像的边缘。在MFI块中,我们建议级联多尺度MLP(Cascade MLP)同时从网络的较深层中处理所有局部信息,并利用CASCADE多尺度机制逐渐融合离散的本地信息。然后,英亩块用于使深度监督着眼于探索前景和背景之间的边界关系以修改医疗图像的边缘。我们提议的CM-MLP框架的分割准确性(DICE)达到96.96%,96.76%和82.54%的三个基准数据集:CVC-ClinicDB数据集,Sub-Kvasir Dataset和我们的内部数据集,这些数据集分别超过了。最先进的方法。源代码和训练有素的模型将在https://github.com/programmerhyy/cm-mlp上找到。
translated by 谷歌翻译
有说服力的战略认可任务要求该系统根据对话识别说服者的采用策略。但是,以前的方法主要集中在上下文信息上,关于纳入心理反馈,即说服的情绪以预测策略知之甚少。在本文中,我们提出了一个跨渠道反馈记忆网络(CFO-NET),以利用情感反馈来迭代地衡量策略的潜在好处,并将其纳入上下文感知的对话信息中。具体而言,CFO-NET设计一个反馈内存模块,包括策略池和反馈池,以获得情感感知的策略表示。该策略池旨在存储历史策略,反馈池是根据反馈情感信息获得更新的策略权重。此外,开发了跨通道融合预测指标,以在情绪感知的策略表示与情境意识的对话信息之间进行相互互动,以供战略识别。 \ textsc {clesuasionforgood}上的实验结果确认,提出的模型CFO-NET可有效地将M-F1的性能从61.74提高到65.41。
translated by 谷歌翻译
Emotional support conversation aims at reducing the emotional distress of the help-seeker, which is a new and challenging task. It requires the system to explore the cause of help-seeker's emotional distress and understand their psychological intention to provide supportive responses. However, existing methods mainly focus on the sequential contextual information, ignoring the hierarchical relationships with the global cause and local psychological intention behind conversations, thus leads to a weak ability of emotional support. In this paper, we propose a Global-to-Local Hierarchical Graph Network to capture the multi-source information (global cause, local intentions and dialog history) and model hierarchical relationships between them, which consists of a multi-source encoder, a hierarchical graph reasoner, and a global-guide decoder. Furthermore, a novel training objective is designed to monitor semantic information of the global cause. Experimental results on the emotional support conversation dataset, ESConv, confirm that the proposed GLHG has achieved the state-of-the-art performance on the automatic and human evaluations. The code will be released in here \footnote{\small{~https://github.com/pengwei-iie/GLHG}}.
translated by 谷歌翻译
药物建议是智能医疗系统的关键任务。先前的研究主要建议使用电子健康记录(EHRS)药物。但是,在EHR中可能会忽略或忽略医生与患者之间的相互作用的一些细节,这对于自动药物建议至关重要。因此,我们首次尝试通过医生和患者之间的对话推荐药物。在这项工作中,我们构建了Dialmed,这是第一个用于基于医学对话的药物建议任务的高质量数据集。它包含与3个部门的16种常见疾病和70种相应常见药物有关的11,996次医疗对话。此外,我们提出了对话结构和疾病知识意识网络(DDN),其中QA对话图机制旨在模拟对话结构,并使用知识图来引入外部疾病知识。广泛的实验结果表明,所提出的方法是推荐与医疗对话的药物的有前途的解决方案。该数据集和代码可在https://github.com/f-window/dialmed上找到。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.
translated by 谷歌翻译
Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.
translated by 谷歌翻译
Decompilation aims to transform a low-level program language (LPL) (eg., binary file) into its functionally-equivalent high-level program language (HPL) (e.g., C/C++). It is a core technology in software security, especially in vulnerability discovery and malware analysis. In recent years, with the successful application of neural machine translation (NMT) models in natural language processing (NLP), researchers have tried to build neural decompilers by borrowing the idea of NMT. They formulate the decompilation process as a translation problem between LPL and HPL, aiming to reduce the human cost required to develop decompilation tools and improve their generalizability. However, state-of-the-art learning-based decompilers do not cope well with compiler-optimized binaries. Since real-world binaries are mostly compiler-optimized, decompilers that do not consider optimized binaries have limited practical significance. In this paper, we propose a novel learning-based approach named NeurDP, that targets compiler-optimized binaries. NeurDP uses a graph neural network (GNN) model to convert LPL to an intermediate representation (IR), which bridges the gap between source code and optimized binary. We also design an Optimized Translation Unit (OTU) to split functions into smaller code fragments for better translation performance. Evaluation results on datasets containing various types of statements show that NeurDP can decompile optimized binaries with 45.21% higher accuracy than state-of-the-art neural decompilation frameworks.
translated by 谷歌翻译
Driven by improved architectures and better representation learning frameworks, the field of visual recognition has enjoyed rapid modernization and performance boost in the early 2020s. For example, modern ConvNets, represented by ConvNeXt, have demonstrated strong performance in various scenarios. While these models were originally designed for supervised learning with ImageNet labels, they can also potentially benefit from self-supervised learning techniques such as masked autoencoders (MAE). However, we found that simply combining these two approaches leads to subpar performance. In this paper, we propose a fully convolutional masked autoencoder framework and a new Global Response Normalization (GRN) layer that can be added to the ConvNeXt architecture to enhance inter-channel feature competition. This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets on various recognition benchmarks, including ImageNet classification, COCO detection, and ADE20K segmentation. We also provide pre-trained ConvNeXt V2 models of various sizes, ranging from an efficient 3.7M-parameter Atto model with 76.7% top-1 accuracy on ImageNet, to a 650M Huge model that achieves a state-of-the-art 88.9% accuracy using only public training data.
translated by 谷歌翻译